Skip to content

fix: eliminate N+1 query in search hydrate_results#713

Merged
groksrc merged 1 commit intomainfrom
fix/search-hydrate-n-plus-1
Apr 3, 2026
Merged

fix: eliminate N+1 query in search hydrate_results#713
groksrc merged 1 commit intomainfrom
fix/search-hydrate-n-plus-1

Conversation

@groksrc
Copy link
Copy Markdown
Member

@groksrc groksrc commented Apr 3, 2026

Summary

Fixes #707 — search hydrate_results made a separate get_entities_by_id() call per search result, creating an N+1 query pattern. On cloud (Neon PostgreSQL, ~300ms per round-trip), this added ~3s for 10 results.

  • Collect all unique entity IDs upfront, fetch in a single batch call, index into a dict
  • Fixes a correctness bug: from_entity previously used positional indexing (entities[0]) which was wrong when entity_id != from_id
  • Adds 11 focused unit tests proving single-call behavior, correct field mapping, dedup, and edge cases

Performance impact

Results Before (DB calls) After (DB calls) Estimated cloud latency saved
10 10 1 ~2.7s (9 × 300ms)
25 25 1 ~7.2s
50 50 1 ~14.7s

Test plan

  • 11 new tests in test_search_hydration.py — prove single DB call, ID dedup, correct field mapping per type, edge cases
  • Existing telemetry test updated and passing
  • Full search test suite: 272 passed, 0 failures
  • ruff lint + format clean
  • pyright clean

🤖 Generated with Claude Code

Replace per-result get_entities_by_id calls with a single batch fetch
using an ID-keyed lookup dict. Also fixes a correctness bug where
from_entity used positional indexing instead of from_id lookup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Drew Cain <groksrc@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 3, 2026

Claude finished @groksrc's task in 1m 2s —— View job


PR Review: fix/search-hydrate-n-plus-1

  • Read changed files
  • Analyze code quality & standards
  • Verify testing & coverage
  • Check architecture & performance
  • Post final review

Summary

Solid fix. The N+1 elimination is correct, the positional-index bug fix is real and well-documented, and the test suite is thorough. A few minor observations below.


Code Quality & Standards

Check Status
Follows Basic Memory coding conventions (CLAUDE.md)
Python 3.12+ type annotations and async patterns
SQLAlchemy 2.0 best practices ✅ (no ORM changes)
FastAPI/Typer conventions
100-character line length
No commented-out code blocks
Literate programming style (section headers, decision comments)

Core Change Analysis (src/basic_memory/api/v2/utils.py)

N+1 elimination is clean: all three ID fields (entity_id, from_id, to_id) are collected in a single pass, deduplicated via a set, and fetched in one get_entities_by_id call. ✅

Correctness bug fix is valid. The old code:

entity=entities[0].permalink if entities else None,
from_entity=entities[0].permalink if entities else None,   # BUG: same slot as entity
to_entity=entities[1].permalink if len(entities) > 1 else None,

Both entity and from_entity mapped to entities[0], which was only correct when entity_id == from_id. The new dict-based lookup (entities_by_id.get(result.from_id)) is correct. ✅

Minor observation — telemetry span wraps an empty-results no-op:

utils.py:191-200 — The fetch_entities telemetry scope is entered even when all_entity_ids is empty (emitting a result_count=0 span with no actual DB work). This is harmless but slightly misleading — the span implies a fetch happened. Consider nesting the scope inside the if all_entity_ids: block, or leaving it as-is since it's a minor issue.

EntityModel type annotation (entities_by_id: dict[int, EntityModel]) adds a model import to the utility. The annotation is accurate and helpful for type checking. ✅

Pre-existing style note (not introduced by this PR): from typing import Optional, List uses pre-3.10 style. Python 3.12+ allows list[...] / ... | None directly, but that's a separate refactor.


Testing & Coverage

Check Status
Unit tests for new logic ✅ 11 tests in test_search_hydration.py
100% coverage maintained
Edge cases covered
Telemetry test updated

The SpyEntityService pattern (tracking call count and args) is a good choice over mocking — it tests behavioral contracts without coupling to implementation details.

The test for the specific bug (test_relation_with_distinct_entity_and_from_ids) clearly documents what the old code got wrong. ✅

The FakeEntityService in test_utils_telemetry.py was correctly updated to return SimpleNamespace objects with .id fields, since the new code builds the lookup dict via {e.id: e for e in entities}. ✅


Security & Performance

Check Status
No hardcoded secrets
Input validation ✅ (unchanged, None guard via if eid is not None)
Performance improvement ✅ O(n) → O(1) DB calls
No sensitive data in logs

Compatibility

Check Status
Windows-compatible file path comparisons ✅ (no path comparisons added)
No emojis/unicode in console/log output

Verdict

LGTM. The change is correct, well-tested, and a meaningful improvement for cloud-hosted deployments. The one minor nit about the telemetry span wrapping an empty-results path is cosmetic and not a blocker.

@groksrc groksrc requested a review from phernandez April 3, 2026 17:37
@groksrc groksrc merged commit 733c4f7 into main Apr 3, 2026
26 checks passed
@groksrc groksrc deleted the fix/search-hydrate-n-plus-1 branch April 3, 2026 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Search hydrate_results N+1 query causes 3s+ latency on cloud

2 participants